Federated learning poses new statistical and systems challenges in training machine learning models over distributed networks of devices. In this work, we show that multi-task learning is naturally suited to handle the statistical challenges of this setting, and propose a novel systems-aware optimization method, MOCHA, that is robust to practical systems issues. Our method and theory for the first time consider issues of high communication cost, stragglers, and fault tolerance for distributed multi-task learning. The resulting method achieves significant speedups compared to alternatives in the federated setting, as we demonstrate through simulations on real-world federated datasets.
IntroductionMobile phones, wearable devices, and smart homes are just a few of the modern distributed networks generating massive amounts of data each day. Due to the growing storage and computational power of devices in these networks, it is increasingly attractive to store data locally and push more network computation to the edge. The nascent field of federated learning explores training statistical models directly on devices [37]. Examples of potential applications include: learning sentiment, semantic location, or activities of mobile phone users; predicting health events like low blood sugar or heart attack risk from wearable devices; or detecting burglaries within smart homes [3,39,42]. Following [25,36,26], we summarize the unique challenges of federated learning below.1. Statistical Challenges: The aim in federated learning is to fit a model to data, {X 1 , . . . , X m }, generated by m distributed nodes. Each node, t ∈ [m], collects data in a non-IID manner across the network, with data on each node being generated by a distinct distribution X t ∼ P t . The number of data points on each node, n t , may also vary significantly, and there may be an underlying structure present that captures the relationship amongst nodes and their associated distributions.
Systems Challenges:There are typically a large number of nodes, m, in the network, and communication is often a significant bottleneck. Additionally, the storage, computational, and communication capacities of each node may differ due to variability in hardware (CPU, memory), network connection (3G, 4G, WiFi), and power (battery level). These systems challenges, compounded with unbalanced data and statistical heterogeneity, make issues such as stragglers and fault tolerance significantly more prevalent than in typical data center environments.In this work, we propose a modeling approach that differs significantly from prior work on federated learning, where the aim thus far has been to train a single global model across the network [25,36,26]. Instead, we address statistical challenges in the federated setting by learning separate models for each node, {w 1 , . . . , w m }. This can be naturally captured through a multi-task learning (MTL) framework, where the goal is to consider fitting separate but relate
translated by 谷歌翻译